New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(WIP) EncodingManager refactoring #1112
Conversation
…actEncodedValue, added MappedDecimalEncodedValue, separated parsing from EncodedValue, introduced TagParserFactory
return encoder.isBool(edge.getFlags(), FlagEncoder.K_ROUNDABOUT); | ||
} | ||
}; | ||
} catch (Exception ex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am struggling with this a bit. Why do we need the try/catch, especially with a generic exception? Has it something to do if there is a encoder stored in the weighting?
Wouldn't it be better to be able to read the state without creating an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not care about it this much. It will&must disappear. Just wanted to make this backward compatible and working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok :)
@@ -40,7 +40,7 @@ public BBox getBox() { | |||
* to specify an explicit bounding box. TODO: support GeoJSON instead. | |||
*/ | |||
public static final LandmarkSuggestion readLandmarks(String file, LocationIndex locationIndex) throws IOException { | |||
// landmarks should be suited for all vehicles | |||
// landmarks should be suited for all profiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about PT :)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean public transit? In its current form A* should work and so also landmarks but not sure if it works nor if it makes it faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was just wondering as it's stating "suited" meaning, it's a good idea to combine PT with LM. I am not sure if this is true, so I was asking.
/** | ||
* This class provides easy access to just one bit. | ||
*/ | ||
public final class BitEncodedValue extends IntEncodedValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, GH missed a Boolean Encoded Value, not sure if we should call it like this instead of Bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the difference here because boolean implies 1 byte and double implies 8 bytes. For Integer I just didn't find a better word yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I meant if we should call it BooleanEncodedValue instead of bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A boolean usually uses 1byte (roughly) in the JVM, so I wanted to differentiate here as we really use just 1 bit, but not that important and we can change to Bool or Boolean
Wow, this change looks impressive! Thanks! I haven't found the time to give this a full in-depth review yet, I only touched the tip of the iceberg so far. My main concern until now is the "ugliness" that you asked to ignore. I would prefer a way to get the state without provoking an exception. I think using exceptions as a "if replacement" should be avoided if possible. If not then it is what it is :). |
Yes, of course. But if statements wouldn't make it more beautiful. We need a different solution, but I'll improve here only if I can solve the followings things before: 0. tuning everything a bit more 1. having a better API for some places (e.g. simple set&get using and 'reverse' stuff) and 2. trying to import a real data set to see if this is actually fast enough and we can model something more realistic. |
this.extendedDataSize = Math.min(1, extendedDataSize / 4) * 4; | ||
} | ||
|
||
public EncodingManager add(TagParser parser) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be only allowed when not initialized?
return extendedDataSize; | ||
} | ||
|
||
// TODO should we add convenient getters like getStringProperty etc? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would make it more convenient IMHO.
* | ||
* @return the storable format that can be read via fromStorageFormatToInt | ||
*/ | ||
public final int toStorageFormat(boolean reverse, int flags, int value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are flags int and not long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it is an array of ints as the underlying RAMDataAccess is an array of ints and so this is the most efficient. But I'm thinking of using a byte array instead, which would be required if the GH node or way IDs exceed the 2^32 integer boundaries. The best would be if we could encapsulate this a bit with something like the IntRefs class and pass this in this method, which would also avoid the need of the public getOffset method.
return uncheckToStorageFormat(reverse, flags, value); | ||
} | ||
|
||
final int uncheckToStorageFormat(boolean reverse, int flags, int value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about unchecked?
private final double precision; | ||
|
||
/** | ||
* TODO should we really use precision here or use something like the already used 'factor'? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm, precision is the commonly used word for Double, so this does not sound wrong to me.
toStorageMap = new IntIntHashMap(values.size()); | ||
|
||
int index = 0; | ||
for (double val : values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it would change a lot, but should we sort the list of Doubles first? This would also allow us to accept Collections in general. On the other hand the current List approach might be a bit more robust against adding new elements at the end if no additional bit is required (not sure if this is actually a use case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah BTW, I think we should also check if any double occurs twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicates checking makes sense. Regarding the sorting: using Collection instead of a List is already possible now.
super(name, (int) Long.highestOneBit(values.size())); | ||
|
||
// we want to use binarySearch so we need to sort the list | ||
// TODO should we simply use a separate Map<String, Int>? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think Map might be faster for larger collections and maybe even easier to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet a map is only faster for relative big collections (>30 entries) ... we should benchmark to make this argument ;). "Easier to use" is not an argument here as we can hide it in this class IMO.
String getName(); | ||
|
||
// TODO Every tag parser has an EncodedValue associated but except for convenient usage in EncodingManager we currently do not need this method | ||
EncodedValue getEncodedValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't hurt but makes the code easier 👍
import com.graphhopper.routing.util.AbstractFlagEncoder; | ||
import com.graphhopper.util.EdgeIteratorState; | ||
|
||
public class TagParserFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this factory a bit of an overkill? Wouldn't it be easier to put this into classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean with 'put this into classes'? Creating a RoundaboutTagParser and a HighwayTagParser etc? (This would introduce a lot more code/overhead IMO :))
I'm not super happy with it now but for me this factory was just convenient to avoid creating the same parsers in tests again and again.
Sounds good and makes sense. I just saw it when browsing through the code and thought I might add my two cents. |
… that IntsRef is a flexible replacement for flags
I was able to fix a few things (regarding highway filter and oneway storage) so that full Germany is now properly imported (in 6min) and possible to route via car. The following config.properties is necessary
and the following parameters are necessary and currently no bidirectional algorithm will work:
Such an uni-dir route request from south to nord takes ~6sec. Not too bad. Only a smaller fraction (~25%) of astar.runAlgo seems to be related to the new 'dynamic' FastestCarWeighting which could be probably further reduced via reusing the IntsRef for the next edges. Somehow the instructions have no street name associated. Will investigate. The applyWayTags is currently very lightweight and accounts to 0% of CPU time in the profiler. |
Sure. |
I do not understand why a new layer (GraphProcessContext) and the changes to the OSMReader is necessary, also the naming is unspecific. But creating separate classes that (if enabled) add multiple TagParsers to the EncodingManager sounds reasonable. Of course renaming TagParsers to something else is still possible IMO this responsibility sounds very similar to the old FlagEncoder concept except that there is now a need to avoid duplication (e.g. parsing&storing maxspeed is required for car and truck but should be stored only once). And this hierarchical aspect could be employed, so we end up in a class hierarchy instead of one big TagParserFactory class. |
Sorry, I was not clear on this. This is how we do it right now. I will ditch both the Reader and the GraphProcessContext and implement everything in your existing classes + the new classes for separate TagParsers. |
Ah, no problem. Sounds great, thanks :) ! |
I've rewritten the structure a bit. All TagParsers (though I have not added all yet) now reside in their respective own class: Tagparsers |
Yes, this looks like a good way to go forward. Now ỳou could remove most of the methods in TagParserFactory or maybe the whole class?
Is such a check necessary if we have the duplicate check for the EncodedValue objects already?
Yes, think so too. But let's fix more important stuff first. These lists should be made private or are the needed outside of it? BTW: in ~2 weeks I'll finally be able to increase priority and push this further faster :) |
Yes, this is now mostly done. Entire thing is of course still a work in progress.
No, you are right. I fixed some things and added most of the remaining parsers, except for a few where I was not yet sure how to handle it (namely SpatialRuleId). Cleaned up some of the files too. I will continue to work on this. |
Hi @karussell , are there in principal different encoded values for all profiles, meaning e.g. bike and bike2 have different encodedValues for average_speed and access etc? Including different tag parsers? |
To make it backward compatible average_speed and access values cannot be shared, yes. E.g. bike2 modifies the speed due to elevation and racing bike is faster, but also access is different as mtb can use paths with a more difficult grade for example. Although I can imagine that making the access shared across bikes should be possible as one can use the Weighting to just avoid these paths (but the space requirement is very low and so this optimization should be done later) In the future one could further try to calculate average_speed dynamically from other generic values but especially for non-CH use cases a precomputed value for both is faster while query. |
I have not yet created all EncodedValues for all profiles, e.g. MotorcycleFlagEncoder. Therefore I have more failing tests than before (Without my changes its 142, with them its 216). I don't know when or if I will have the time to do add every remaining profile. Would you like me to stage a pull request anyway? Sorry for all the questions. |
I'm not sure if I understand this but a PR will make it clear probably.
Sure
Yes, please. You can add [WIP]
No need for that really :) |
Hi, any update on this so far? |
I'm currently in the process of defining what should go in this PR and what not. Here are the main goals of this PR:
non-goals:
And so I will remove the AccessValue refactoring and create a separate PR, see this separate branch: https://github.com/graphhopper/graphhopper/tree/access_refactoring Also I'll remove the DefaultEdgeFilter refactoring as it is now in master. Furthermore the json refactoring won't happen in this PR: #1291 (comment) and we'll use a string based EncodedValue properties storage somehow. Now I'll try to merge your PR and rebase the changes against the access_refactoring branch. As a first step after this bigger change I'll try to fix the performance problem (currently we are 2x slower) and in parallel we could work on making more tests green. |
Currently osm tags are only processed before edges are created via parse(IntsRef ints, ReaderWay way). For some of our information we need a parser that is called after edge creation, such as parse(EdgeIteratorState edge, ReaderWay way). On a different node, the optimal case for us would be to have a possibility of defining TagParsers on the fly even from outside the graphhopper repo. This should be some sort of method like TagParserFactory.add(TagParser p) or equivalent in EncodingManager, so that we can maintain our own parsers. I think this would be optimal for other projects built on graphhopper too. |
What would be a use case? Why couldn't you do a post processing after the graph is created?
This should be already possible, but instead of TagParserFactory you would add it to the EncodingManager (?) |
This PR is too big. I'll split it into separate ones:
|
Closing in favor of #1447 |
This is a first sketch of a new approach to improve the flexibility for the edge property storage and it fixes #472, #728, #978 and probably some more issues.
For the moment just look at the nice FastestCarWeighting and how we fetch the values in the constructor that are necessary in the calcWeight method, really easy and understandable:
These values are composed upfront and are responsible just for a value (and optionally a reverse value), have a look into OSMReaderTest.testEncodedValueBasedEncodingManager for a complete test parsing an OSM file.
So all in all I was able to remove the necessity of the FlagEncoder classes. So just EncodingManager and a list of EncodedValue is necessary. And the best: this already works within the existing system and the new EncodingManager is compatible with the old.
I've created a EncodedValueFactory to create some default EncodedValues like maxspeed, surface, 'car access' etc so that useful Weightings can be created.
The main changes in this PR:
long flags
was replaced with a IntsRef that is not limited to 64bits (later on we could use this for all edge properties like distance and pointers, also there is a minor advantage that we could point directly into the storage backing array instead of copying but this is for a later PR)main goals of this PR:
non-goals:
Further work items:
merge new EncodingManager with oldtry a real world example and parse OSM with a car profile and query via the new weighting mechanismuse IntRefs instead of int flagsa directed DoubleProperty and BooleanProperty class that holds two values (one value for a direction) that can be easily 'reversed' -> ComposedEncodedValue holds multiple encoded values? support get and getReversed directly for every encodedValue?should we introduce a PropertyFilter so that a maxspeed is not parsed and calculated if the highway tag is not accepted ('rail')?move out EncodedValue.parse into something else to make support of non-OSM easierLater:
iter.setBaseNode(node, reverse=true)
and then avoid the reverse parameter in Weighting.calcWeight as well as the if clauses to fetch e.g. the forward or the reverse speed. I.e. one edge is directed API-wise and undirected storage-wise. We should do this undirected vs. directed stuff in a later step, see also (WIP) EncodingManager refactoring #1112 (comment)access
-EncodedValue with e.g. themax_speed
-EncValue?